16 research outputs found
The NLMS algorithm with time-variant optimum stepsize derived from a Bayesian network perspective
In this article, we derive a new stepsize adaptation for the normalized least
mean square algorithm (NLMS) by describing the task of linear acoustic echo
cancellation from a Bayesian network perspective. Similar to the well-known
Kalman filter equations, we model the acoustic wave propagation from the
loudspeaker to the microphone by a latent state vector and define a linear
observation equation (to model the relation between the state vector and the
observation) as well as a linear process equation (to model the temporal
progress of the state vector). Based on additional assumptions on the
statistics of the random variables in observation and process equation, we
apply the expectation-maximization (EM) algorithm to derive an NLMS-like filter
adaptation. By exploiting the conditional independence rules for Bayesian
networks, we reveal that the resulting EM-NLMS algorithm has a stepsize update
equivalent to the optimal-stepsize calculation proposed by Yamamoto and
Kitayama in 1982, which has been adopted in many textbooks. As main difference,
the instantaneous stepsize value is estimated in the M step of the EM algorithm
(instead of being approximated by artificially extending the acoustic echo
path). The EM-NLMS algorithm is experimentally verified for synthesized
scenarios with both, white noise and male speech as input signal.Comment: 4 pages, 1 page of reference
A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition
This article provides a unifying Bayesian network view on various approaches
for acoustic model adaptation, missing feature, and uncertainty decoding that
are well-known in the literature of robust automatic speech recognition. The
representatives of these classes can often be deduced from a Bayesian network
that extends the conventional hidden Markov models used in speech recognition.
These extensions, in turn, can in many cases be motivated from an underlying
observation model that relates clean and distorted feature vectors. By
converting the observation models into a Bayesian network representation, we
formulate the corresponding compensation rules leading to a unified view on
known derivations as well as to new formulations for certain approaches. The
generic Bayesian perspective provided in this contribution thus highlights
structural differences and similarities between the analyzed approaches
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
We propose a spatial diffuseness feature for deep neural network (DNN)-based
automatic speech recognition to improve recognition accuracy in reverberant and
noisy environments. The feature is computed in real-time from multiple
microphone signals without requiring knowledge or estimation of the direction
of arrival, and represents the relative amount of diffuse noise in each time
and frequency bin. It is shown that using the diffuseness feature as an
additional input to a DNN-based acoustic model leads to a reduced word error
rate for the REVERB challenge corpus, both compared to logmelspec features
extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201
Working Memory and Response Inhibition as One Integral Phenotype of Adult ADHD? A Behavioral and Imaging Correlational Investigation
Objective: It is an open question whether working memory (WM) and response inhibition (RI) constitute one integral phenotype in attention deficit hyperactivity disorder (ADHD). Method: The authors investigated 45 adult ADHD patients and 41 controls comparable for age, gender, intelligence, and education during a letter n-back and a stop-signal task, and measured prefrontal oxygenation by means of functional near-infrared spectroscopy. Results: The authors replicated behavioral and cortical activation deficits in patients compared with controls for both tasks and also for performance in both control conditions. In the patient group, 2-back performance was correlated with stop-signal reaction time. This correlation did not seem to be specific for WM and RI as 1-back performance was correlated with go reaction time. No significant correlations of prefrontal oxygenation between WM and RI were found. Conclusion: The authors' findings do not support the hypothesis of WM and RI representing one integral phenotype of ADHD mediated by the prefrontal cortex
Pneumothorax detection in chest radiographs: optimizing artificial intelligence system for accuracy and confounding bias reduction using in-image annotations in algorithm training
OBJECTIVES Diagnostic accuracy of artificial intelligence (AI) pneumothorax (PTX) detection in chest radiographs (CXR) is limited by the noisy annotation quality of public training data and confounding thoracic tubes (TT). We hypothesize that in-image annotations of the dehiscent visceral pleura for algorithm training boosts algorithm's performance and suppresses confounders. METHODS Our single-center evaluation cohort of 3062 supine CXRs includes 760 PTX-positive cases with radiological annotations of PTX size and inserted TTs. Three step-by-step improved algorithms (differing in algorithm architecture, training data from public datasets/clinical sites, and in-image annotations included in algorithm training) were characterized by area under the receiver operating characteristics (AUROC) in detailed subgroup analyses and referenced to the well-established \textquotedblCheXNet\textquotedbl algorithm. RESULTS Performances of established algorithms exclusively trained on publicly available data without in-image annotations are limited to AUROCs of 0.778 and strongly biased towards TTs that can completely eliminate algorithm's discriminative power in individual subgroups. Contrarily, our final \textquotedblalgorithm 2\textquotedbl which was trained on a lower number of images but additionally with in-image annotations of the dehiscent pleura achieved an overall AUROC of 0.877 for unilateral PTX detection with a significantly reduced TT-related confounding bias. CONCLUSIONS We demonstrated strong limitations of an established PTX-detecting AI algorithm that can be significantly reduced by designing an AI system capable of learning to both classify and localize PTX. Our results are aimed at drawing attention to the necessity of high-quality in-image localization in training data to reduce the risks of unintentionally biasing the training process of pathology-detecting AI algorithms. KEY POINTS • Established pneumothorax-detecting artificial intelligence algorithms trained on public training data are strongly limited and biased by confounding thoracic tubes. • We used high-quality in-image annotated training data to effectively boost algorithm performance and suppress the impact of confounding thoracic tubes. • Based on our results, we hypothesize that even hidden confounders might be effectively addressed by in-image annotations of pathology-related image features
Working Memory and Response Inhibition as One Integral Phenotype of Adult ADHD? A Behavioral and Imaging Correlational Investigation
Objective: It is an open question whether working memory (WM) and response inhibition (RI) constitute one integral phenotype in attention deficit hyperactivity disorder (ADHD). Method: The authors investigated 45 adult ADHD patients and 41 controls comparable for age, gender, intelligence, and education during a letter n-back and a stop-signal task, and measured prefrontal oxygenation by means of functional near-infrared spectroscopy. Results: The authors replicated behavioral and cortical activation deficits in patients compared with controls for both tasks and also for performance in both control conditions. In the patient group, 2-back performance was correlated with stop-signal reaction time. This correlation did not seem to be specific for WM and RI as 1-back performance was correlated with go reaction time. No significant correlations of prefrontal oxygenation between WM and RI were found. Conclusion: The authors' findings do not support the hypothesis of WM and RI representing one integral phenotype of ADHD mediated by the prefrontal cortex